Big Data versus the Crowd: Looking for Relationships in All the Right Places
نویسندگان
چکیده
Classically, training relation extractors relies on high-quality, manually annotated training data, which can be expensive to obtain. To mitigate this cost, NLU researchers have considered two newly available sources of less expensive (but potentially lower quality) labeled data from distant supervision and crowd sourcing. There is, however, no study comparing the relative impact of these two sources on the precision and recall of post-learning answers. To fill this gap, we empirically study how state-of-the-art techniques are affected by scaling these two sources. We use corpus sizes of up to 100 million documents and tens of thousands of crowd-source labeled examples. Our experiments show that increasing the corpus size for distant supervision has a statistically significant, positive impact on quality (F1 score). In contrast, human feedback has a positive and statistically significant, but lower, impact on precision and recall.
منابع مشابه
Phenomenology of Place in Student's Life-World of Tehran University
The student’s Life-world, as the reference groups of community in the future, must be deeply explored and described in all its complexities, dimensions, elements and details. Therefore, the place of informal interactions of students as part of this Life-world is important in this study. The main purpose of the present study is that describe the students' mental perception of their informal inte...
متن کاملA Data-driven Method for Crowd Simulation using a Holonification Model
In this paper, we present a data-driven method for crowd simulation with holonification model. With this extra module, the accuracy of simulation will increase and it generates more realistic behaviors of agents. First, we show how to use the concept of holon in crowd simulation and how effective it is. For this reason, we use simple rules for holonification. Using real-world data, we model the...
متن کاملBig Data Uses in Crowd Based Systems
There are currently many trends in computer science, like Smart Cities, Internet of Things, and Wireless Sensor Networks. Many of these systems require or could dramatically benefit from having information about crowds. First of all, many of the systems are built to improve the life of people, and they require information about them to be able to know when to activate their functionality in ord...
متن کاملFinding the Number of Members with Certain Relationships in Social Networks and Big Size Organizations using Copositive Programming
In social networks and big size organizations, finding the number of members that all of them have a certain relationship w* is an important problem for managers, as well the number of members that none of them has not the relationship (w*). Considering the members as vertices and the relationship as edges, w* and w* denote the clique number and the number of edges in the independent graph, re...
متن کاملCost-Efficient Querying Strategies for the Crowd
To enhance data processing, crowdsourcing is a mechanism that has evolved recently and has been picked up by companies that refine large amounts of data through so-called crowd workers. Generally, there are two challenges to crowdsourcing: First, the crowd is an uncertain input source because a worker not necessarily knows the right answer or simply answers wrongly. Second, crowdsourcing comes ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012